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ABSTRACT 



The effectiveness of several iterative techniques for solv- 
ing matrix equations resulting from finite difference approxima- 
tions to self-adjoint parabolic and elliptic partial differential 
equations is reviewed. The techniques are: 

1. SIP- Strongly Implicit Procedure 

2. ICCG- Incomplete Cholesky Conjugate Gradient Method 

3. VICCG- Vectorized ICCG 

4. MICCG- Modified ICCG 

5. DSCG- Diagonally Scaled Conjugate Gradient Method 

6. POLCG- Polynomial Preconditioned Conjugate Gradient 

Method. 

7. PICCG- Polynomial form of ICCG 

The comparison is made on a vector machine (two-pipe Cyber 
205) where vectorizat ion of the code is done primarily by the 
vector compiler available. It is found that of the methods 
studied, POLCG and MICCG appear to require the least amount of 
CPU time. An advantage of MICCG over POLCG is that it is less 
sensitive to increasing matrix size. Its disadvantages are that 
it requires an iteration parameter, has a greater set-up time, 
and needs more storage than POLCG. 
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1. Introduction 



In the numerical solution of two and three dimensional 
partial differential equations one is often led to a system of 
linear equations (or a matrix equation) which must be solved. 
Efficient performance of this task by the researcher has become 
complicated with the advent of vector and parallel processing 
machines. Depending upon the algorithm, machine, and problem 
to be solved, an "optimum" strategy taken by the researcher 
should take into consideration properties of various algorithms 
such as their vectorizability and parallelizability . 

The present work is an effort to study the efficiency of 
several iterative methods used in the solution of symmetric 
banded systems of linear equations. They are: Stone 's[l] Strong- 
ly Implicit Procedure (SIP) ; the Incomplete Cholesky Conjugate 
Gradient Method (ICCG) of Meijerink and van der Vorst[2], includ- 
ing its vectorized variant (VICCG (0) ) [ 3 ] , and Gustaf sson ' s [ 4 ] 
modified form (MICCG) ; a diagonally scaled conjugate gradient 
method (DSCG) ; Saad's[5] polynomial preconditioned conjugate 
gradient method (POLCG) ; and finally a polynomial variation of 
ICCG (PICCG) . The machine on which comparisons are made is a 
Cyber 205 which has a peak operating performance of 100 mega- 
flops for vector addition and/or multiplication (200 megaflops 
for linked triadic vector operations) . 

New algorithms are not introduced (although some modifica- 
tions to existing methods are tried) . The purpose of this effort 
is to take a series of known methods, apply them to problems with 
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similar characteristics to what might be found in real physical 
settings (including three dimensional problems of groundwater 
modelling), and determine which method(s) reviewed have superior 
performance on a Cyber 205. 

Little has been written in the water resource journals on 
many of the now available vectorizable iterative techniques for 
solving large sparse matrices. Trescott and Larson [10] studied 
the convergence properties of SIP in comparison to ADI (Alternat- 
ing Direction Implicit) and LSOR(Line Successive Overrelaxation) . 
In 1981 Kuiper[ll] compared SIP to ICCG concluding that for con- 
fined aquifer problems ICCG appeared to be superior while for 
water table conditions (nonlinear) the methods performed compar- 
ably well. More recently Kuiper[12] compared a number of iterative 
techniques on a set of linear and nonlinear groundwater problems. 
The conclusion of his most recent study was that the Picard- 
preconditioned conjugate gradient methods performed best for 
both the linear and nonlinear problems. Each of these papers are 
concerned with performance on scalar machines. 

In the open literature there are several comparisons of 
iterative methods which take into account vectorizability of 
particular algorithms, and their efficiency on given "supercom- 
puters". A few of the more pertinent references are given below. 

Meurant[6] and Jordan[7] have compared VICCG, POLCG, and 
block preconditioned conjugate gradient methods for two dimens- 
ional problems. Meurant's study concluded that for "small" prob- 
lems (grids of 60x60) POLCG seemed optimal on the 2-pipe Cyber 
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205 while for larger problems the block preconditioning methods 
appeared to do better. Jordan preferred polynomial precondition- 
ers from the standpoint of parallel processing. 

Hayami and Harada[8] have recently estimated that on the NEC 
SX-2 supercomputer (assuming an acceleration due to vector pro- 
cessing of 40 times the scalar rate) DSCG will run ten times 
faster than ICCG for most problems. As the Cyber 205 is also a 
highly vectorizable machine DSCG has been included as one of the 
algorithms to be tested. 

Gustaf sson [ 4 ] considered a class of first order methods 
(SSOR and MICCG(n)) which were shown to have markedly better 
convergence properties than ICCG. Extending Gustafsson's MICCG, 
Ashcroft and Grimes [9] applied the algorithm to three dimensional 
problems and in particular programmed it in such a way that (at 
least on a Cray machine) much of the code can be made to run con- 
currently. Their analysis compares MICCG, SSOR, DSCG and Nofill 
Red/black Incomplete factorization preconditioners concluding 
that the ”concurrent” MICCG method, while not as vectorizable as 
some of the other methods, is superior due to its relatively low 
iteration count. 

The present work differs from that previously reported in 
the following respects. The machine used is exclusively a two- 
pipe Cyber 205, both two and three dimensional problems are run, 
two of the five test problems have anisotropic conditions, and 
finally, the set of iterative techniques tested is larger than 
previously reported. 



5 



The matrix equations used in the present comparison arise 
from finite difference approximations to the 2D and 3D partial 
differential equations governing the flow of groundwater in con- 
fined aquifers. These equations involve self adjoint parabolic 
(or elliptic for the case of steady state flow) differential 
operators for which all of the iterative schemes tested are ap- 
plicable. Equations of a similar type result in various transport 
phenomenon such as heat conduction and laser fusion. 

The paper is outlined as follows. In section 2, the five 
model problems are described. This is followed in section 3 by a 
short explanation of the algorithms. Section 4 concludes with 
comparitive results of each on the set of test problems. 
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2. Formulation of the Problems 



The derivation of the three dimensional partial differential 
equation governing the distribution of hydraulic head in a con- 
fined aquifer can be found in Reddell and Sunada[13]. Assuming 
components of the transmissivity tensor lie along coordinate axes 
the equation can be concisely written as: 

(1) Sht = (K^h^)x +(Kyhy)y +(KZh2)z + R 

where h is the pressure head to be determined; S is the storage 
coefficient of the porous media; K^, K^, and are transmis- 
sivities of the aquifer in the x,y, and z directions; and R is 
the volumetric flux of recharge or withdrawal per unit volume of 
water from the aquifer. Sources and sinks are approximated by 
delta functions with strengths equal their volumetric flux. 

The primary boundary conditions are Dirichlet and homogen- 
eous Neumann, which correspond physically to constant head and 
no-flux boundaries respectively. 

In each of the test problems, the aquifer is contained in 
a rectangular solid whose faces are assigned appropriate bound- 
ary conditions. For aquifers of a more general shape, points 
exterior to the aquifer boundaries are assigned zero trans- 
missivities for the purpose of eliminating their role in the 
determination of head values interior to the aquifer (see for 
example [14]). This is done in problem five where an "L” shaped 
region is considered. 

In the first four problems, a steady state solution is 
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sought. Of these the fourth has a nonunique solution since all 
of its boundary conditions are homogeneous Neumann. The initial 
guess for h in each of the first four problems is zero except 
at points where sources/sinks are present or non-homogeneous 
Dirichlet boundary conditions apply. The fifth test problem 
considers a fully time dependent problem, but solves for only 
one time step, given values of the head everywhere for the 
immediately preceding time level: 

(2) h(x,y,z)=102 . 
problem 1 : 

The first problem models an isotropic homogeneous aquifer 
with constant head boundaries. No sources or sinks are present. 
The transmissivities and boundary conditions are: 

K^=Ky=K^=1.0 and 

(3) 

h=l on all boundaries. 

problem 2 : 

The second test problem examines the effect of using trans- 
missivities with linear gradients along each of the coordinate 
axes. As before, no sources or sinks are present. Specification 
of case two is as follows: 

K^=l-16y/17 

Ky=(16x+1)/17 

K^=(96z+1)/10 
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( 4 ) 



hj^ = 0 at x = 0 and z = l 
hj^=l at x=l 
h=10 at y=0, and 
h=l at y=l,z=0. 



problem 3: 

Refer to figure 1 in the description of the third test prob- 
lem. There are four subsections of the aquifer which are isotrop- 
ic and homogeneous, but whose transmissivities lie in the range 
zero to forty. A feature of this problem which makes it difficult 
is that one of the subsections is an impermeable layer with a 
relatively small transmitting aperture. A source of strength 1.0 
is positioned at (. 875 ,. 9375 ,. 1875) , and a sink of strength .25 
is located at ( . 4375 , . 625 , . 5) . Specification of case three is: 

K^=Ky=K^=1.0 in region A 
K>^=Ky=K^=0. 0 in B 
K^=Ky=K^=20. 0 in C 
K^ = Ky = K^ = 4 0.0 in D 



(5) 



hj^=0.0 at y=0,z=l 



hj^=-1.0 at x=l 
h=l at y=l, z=0, and 
h=5 at x=0. 

The center point of the impermeable shell (region B) is at 
(.5, .4375, .5) with side length .75 and width .125. The center- 
point for region C is the same as for B but has a side length 
of .375. The center for region D is (.5,. 9,. 5) with side length 
.125, and finally the aperture is centered at (.75,. 5,. 5) with 
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side length in the y and z directions of .25, and width .125. 
problem 4: 

Test case four is a two dimensional example taken from 
Stones 's paper [1]. Two dimensional problems are solved by the 
three dimensional code by keeping boundary conditions and trans- 
missivities independent of the z coordinate. See figure 2 for an 
illustration of the domain modelled. Specifications are: 

K^=l everywhere 
K^=Ry=l in region A 

( 6 ) 

K^=l, Ky=100 in B 
K^=100, K^=l in C 
K^=Ky=K^=0 in D, and 
hj^=0 on all boundaries. 

Line sources for problem four were located at (x,y) equal to 
(.1,.1),(.1,.9), (.767, .133) with strengths 1.0,. 5 and .6 re- 

spectively; while sinks were located at (.467,. 5) and (.9,. 9) 
with strengths 1.83 and .27. They are represented in figure 2 
as (+) 's and (-) 's. 
problem 5: 

Problem five can be found in Kershaw's paper[15]. It is 
two dimensional and is without sources or sinks. In figure 3, 

D or N at a boundary signifies homogeneous Dirichlet or homo- 
geneous Neumann conditions respectively. Equation (2) gives the 
initial condition for this problem, and the transmissivities are 

K^=Ky=K^=10”^ in region A 
K^=Ky=K^=10"2 in B 
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(7) 

K^=Ky=K^=10 in C, and 
K^=Ky=K^=10^ in D. 

3. Summary of the Numerical Methods 

Equation (1) is approximated by a node-centered finite dif- 
ference formulation. The truncation error is second order in the 
spatial increments if a uniform mesh is used. The accuracy drops 
to first order if the mesh is nonuniform. The boundary conditions 
are handled in such a way that the second order truncation error 
is maintained regardless of the spacing. 

By employing the "natural" ordering of the grid points a 
seven banded symmetric matrix equation results, 

( 8 ) Ah=b 

where A is positive semi-definite, b represents known boundary 
conditions and possible sources or sinks, and h is the vector of 
unknown pressure head values to be solved for. 

Solving for h involves iterating on an initial guess to a 
final solution which in some sense must be close to the exact 
answer. Defining the m^^ iterate as h^^), an error vector 
associated with the m^^ iterate is introduced: 

(9) e"'=b-Ah^’^) 

The inf-norm of scaled to defines the m^^ residual 

used in the results section of this paper. The iterative proc- 
ess is halted when the absolute magnitude of the scaled residual 

• —ft 

IS less than a tolerance level of 10 °. 

The iterative algorithms are outlined below. For a complete 
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description of any particular method the reader is referred to 
the original works from which the methods have been extracted. 

SIP ; 

The original version of Stone' s[l] SIP was extended to three 
dimensional problems by Weinstein et al[16]. The method splits 
up the matrix A(eq. 8) into an approximate LU factorization, where 
L and U are upper and lower triangular matrices. The product 
of these two matrices, however, yields thirteen nonzero diagonals 
rather than seven which were originally in A. Two term Taylor 
series corrections (weighted by an iteration parameter) are 
"lumped" into the original LU factors producing a revised fact- 
orization which minimizes somewhat the effects of the concom- 
mitant diagonals. 

Values for the iteration parameters range between zero and 
one, with values close to one being near optimal. Careful sel- 
ection of this parameter is critical. Too large a value could 
cause divergence while too small a value may lead to very slow 
convergence. A cycle of 3 parameters was selected (see eg 10) 
where is found by employing the formula given in Wein- 

stein et al[16] (using an average value, rather than a maximum 
value over the entire domain) . 

(10) l-“t=(l-«max)^'^^' t=l,2,3. 

A refinement of SIP involves renumbering the nodes in pro- 
ducing the matrix equation (8) . Weinstein et al use two out of a 
a possible four distinct numberings that can be made by re- 



12 



versing the order of the nodes along one or two of the coord- 
inate axes. Including a second renumbering of the A matrix and 
performing two approximate inversions of A for each SIP iteration 
was found to increase the total computation time required to con- 
verge to a solution. In addition, if (as is done here) the L and 
U matrices are stored for each renumbering of the A matrix (as 
well as for each a^) rather than recalculated at each SIP 
iteration, the amount of storage required can become quite large. 
For these reasons only the natural ordering of the nodes is used. 
Conjugate Gradient Methods : 

With the recent introduction of incomplete Cholesky pre- 
conditioners [ 2 ] to the basic conjugate gradient algorithm [ 17 , 
18,19], there has been a renewal of interest in this algorithm 
and an ongoing search for ''optimal” preconditioners [ 6 , 7 , 8 , 9 ] . 

Conjugate gradient methods differ from SIP in that they 
are based upon the assumption that A is a symmetric matrix. 

SIP has no such restriction. Variations of preconditioned 
conjugate gradient methods do exist for nonsymmetric matrices 
[20,21] but the theory and application is significantly dif- 
ferent than for the symmetric case. 

The incomplete Cholesky conjugate gradient method of 
Meijerink and van der Vorst[2], involves a similar factorization 
of the A matrix as found in SIP. It is given by: 

(11) AaLDL*^, 

where D is a diagonal matrix, and L is a lower triangular matrix 
whose nonzero diagonals match positions of those in the A matrix. 
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The product of LU from SIP and LDL*^ from ICCG are ident- 
ical provided the iteration parameter used in SIP is set to zero. 
The ICCG factorization neglects the six additional diagonals 
which result in the product of LDL*^. Alternative factoriza- 
tions include extra diagonals in the L matrix in an effort to 
give a more accurate Cholesky decomposition, but these are not 
considered here. Such variants tend to increase the storage 
reguirements of the method and the amount of recursive work in 
each iteration. This was observed by Kershaw[15]. 

Gustafsson[4] improved on ICCG by incorporating Stone's idea 
of minimizing the effect of concommitant diagonals produced in 
the LDL*^ factorization. Unlike Stone, who uses a two term Taylor 
expansion of the unwanted terms of the factorization, Gustafsson 
employs a one term Taylor expansion. This guarantees that a sym- 
metric factorization can be found and that the conjugate grad- 
ient algorithm remains applicable. Gustafsson 's method is refer- 
red to as MICCG. 

Just as in Stone's method an iteration parameter is used 
which ranges in value between zero and one. MICCG was found to 
be less sensitive to the actual value of the iteration parameter 
than is SIP. For each of the five problems tested, a trial and 
error system was used to find optimal values (the magnitudes 
ranged between .95 and 1.0). 

The advent of vector processing machines has resulted in ef- 
forts to modify the inherent recursion involved in inverting the 
LDL*^ approximation to the A matrix. This was partially ac- 
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complished by van der Vorst[3] who split the lower triangular 
matrix L into diagonal blocks which could then be individually 
inverted via truncated Neumann series. 

Van der Vorst's method of vectorizing the recursive part of 
the iteration loop has been employed in "vectorizing" MICCG. The 
resultant algorithm is referred to in this paper as VICCG and 
should not be confused with van der Vorst's vectorized ICCG. As 
with MICCG an iteration parameter must be provided and in general 
is found to have a magnitude less than the optimal value found 
for MICCG. It is also found by trial and error. 

One final variant of the MICCG algorithm involves using a 
three term truncated Neumann series approximation for invert- 
ing the lower triangular matrix L in the LDl"^ factorization. 

While this variation eliminates recursion altogether in the 
iteration loop, it was found that for the fastest convergence 
the "optimal" iteration parameter was usually zero. The re- 
sulting algorithm therefore is actually a polynomial form of the 
unmodified original ICCG method. It is here referred to as PICCG. 

An alternative to factoring the A matrix to obtain a pre- 
conditioner, involves approximating A's inverse by a matrix 
polynomial [ 22 ] . Recently, Dubois et al[23] have reported success 
in using a truncated Neumann series expansion in A for an approx- 
imation to A's inverse. Johnson et al[24] have proposed a poly- 
nomial preconditioner whose coefficients are determined in an 
effort to reduce the condition number of the product of A and 
its preconditioner. Minimizing this number is important since 
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theoretically the rate of convergence of the conjugate gradient 
algorithm increases as the magnitude of the condition number is 
reduced. 

Saad's[5] method is a variant of the method of Johnson et 
al. It attempts to minimize 

r^max 

( 12 ) oJ (l-s(M) ) ^w(M)dM 

over all polynomials s(/i) less than or equal to a specified 
degree, and where Mmax largest eigenvalue of the matrix 

A. In the present work a polynomial of third degree is sought, 
with weight factor: 

(13) w(M)=(Mmax"^)~^^~^- 

Saad originally proposed finding Mmax employing Gershgorin 

circles[25]. In the present work Mmax fixed at 2 which is near 

the value found by employing Gershgorin 's theorem to matrices 
(of the type considered here) which have been diagonally scaled. 
In almost every instance the choice of 2 as an upper bound over 
that found via Gershgorin 's theorem produced faster convergence. 
This algorithm is referred to as POLCG. 

If instead of a polynomial of order three one of zero^^ 
order is selected, and the A matrix is diagonally scaled, the 
preconditioner for the conjugate gradient algorithm becomes the 
identity matrix. This simple preconditioner is referred to here 
as DSCG . 
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4 . Results 



Results of the tests are presented in tables I and II of 
this section. Included are: timings, iteration parameters, 
iteration counts, and megaflop rate for the iteration loop of 
each algorithm. In table II, results from a subset of the prob- 
lems listed in table I which have been modified to increase 
the number of nodes and thereby the size of matrix equation to 
be solved are given. 

The categories in the tables are relatively self explan- 
atory. By set-up time is meant the time necessary to set up the 
matrix equation and to initiate the iteration loop. The megaflop 
rate was found by dividing the total number of operations per- 
formed by the average CPU time needed to complete one iteration 
loop. Parenthetical values alongside those of VICCG are results 
of setting its iteration parameter to zero. This produces a code 
which mimics van der Vorst's[3] vectorized version of ICCG. 

The burden of vectorizing each of the algorithms was 
carried primarily by the Cyber 205 vector compiler with scalar 
optimizer. Few special Q8 calls are used. In particular they 
are Q8MAX and Q8SD0T. Q8MAX is used by each of the schemes 
for computing the inf-norm of the residual vector for conver- 
gence checking. Q8SD0T gives the scalar dot product of two 
vectors and is used solely by the conjugate gradient codes. 
Special "chaining" of do loops for the purpose of increasing 
the number of linked triadic operations was not done. 

Diagonal scaling of the A matrix was performed only on 
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POLCG and DSCG. To find the scaled residual as defined by (9) 
the computed residuals are first multiplied by the diagonal 
scaling factors to find "true" values of the residuals. Scaling 
the remaining conjugate gradient algorithms as outlined by 
Eisenstat[26] would not reduce the number of calculations in 
the present instance because the same number of operations 
saved in one portion of the algorithm would need to be per- 
formed in finding the "true” residual as given in (9) . 

Two of the codes seem to perform better than all others 
in table I. They are MICCG and POLCG, which have the fastest 
iteration times on 2 and 3 of the test problems respectively. 

SIP performed relatively poorly on all but the first two test 
cases. (If the tolerance level for convergence changes from 
10“® to 10“^ SIP does dramatically better-particularly 
on problem 2-but for such a stopping criterion SIP gives in- 
correct results for problem 3. Given this larger tolerance level, 
POLCG and SIP performed the best on two problems each.) DSCG and 
PICCG did better than both ICCG and VICCG for the problems in 
table I leading one to believe that vectorizing the ICCG code in 
the manner of van der Vorst is not particularly efficient for 
small three dimensional problems because the vector lengths are 
not long enough. The situation for VICCG is worse if an iteration 
parameter is not used (as it is for MICCG) to improve on the iter- 
ation count. The phenomenal timings of SIP and MICCG for test 
problem 1 indicate that the preconditioner (for MICCG) and the LU 
factorization (for SIP) are nearly exact inverses of the original 
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matrix. (The exact answer for test problem one was a constant 
head value throughout the domain.) 

Table I highlights some striking differences between POLCG 
and MICCG. POLCG has nearly three times the megaflop rate of 
MICCG while MICCG requires on average only half as many itera- 
tions. On a scalar machine POLCG could not compete with MICCG 
because it requires more iterations and actually performs more 
operations in its inner loop than MICCG. The vectorizability of 
POLCG allows it to be competitive with MICCG on a Cyber 205. 

Continuing the comparison between POLCG and MICCG it is 
noted that for "optimal" convergence in MICCG a suitable iter- 
ation parameter must be found while for POLCG no such parameter 
is needed. Ashcroft and Grimes [9] recommend an optimal iter- 
ation parameter of just less than unity for MICCG based upon 
empirical evidence. While these observations appear to be gen- 
erally true, there are exceptions. In the first test problem 
if a value of .999 is used rather than 1.0 the iteration count 
increases from 2 to 16. While problem 1 is a special case, in- 
stances occur where the value of an "optimal" iteration parameter 
is rather sharply defined (in particular when modelling aniso- 
tropic media) . For such cases several runs of MICCG may be re- 
quired to ensure that a given choice of iteration parameter is 
not far from optimal. For the "larger" problems reported in 
table II, it was found that the iteration count varied from 61 
to 42 to 51 as the iteration parameter for MICCG changed from 
.95 to .994 to .9996 on problem 2. Similarly on problem 4 of 
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table II, altering the iteration parameter of MICCG from .95 to 
.999 to .9995 produced iteration counts of 102, 63, and 66 
respectively . 

The neccesity of an iteration parameter for MICCG may 
become a problem if the matrix equation is periodically updated 
(as is done when modelling groundwater problems for which water 
table conditions apply) . If revision of the matrix equation is 
frequently done during the solution of a time dependent problem 
one would expect some variation in the value of the "optimal" 
iteration parameter. To find its value at each update would be 
impractical, while using the same value throughout the itera- 
tions may or may not be optimal for the whole problem. The only 
way to be assured of its "optimality" is to run a series of tests 
on the complete time dependent problem for a variety of iter- 
ation parameters. Analysis of a single time step is not suffic- 
ient. 

A further disadvantage in using MICCG rather than POLCG on 
problems requiring frequent updates is that the set-up time for 
MICCG is roughly three times that of POLCG. 

An observable advantage in using MICCG over all of the other 
methods has been previously reported[9] and is confirmed in the 
tests reported here. Comparing iteration counts in problems 2 
and 4 of table I with those in table II, it can be seen that 
the increase for MICCG compared to that for all of the other 
methods is much less. The rate at which the number of iterations 
increases as a function of N (number of nodes along one dimen- 
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sion of the numerical grid) is O(N^) for MICCG while for 
POLCG and ICCG the increase appears to be 0(N). The rate for 
VICCG falls somewhere in between depending upon the test problem. 

VICCG improves substantially as the size of the problem 
increases as can be seen by comparing table I results to those of 
table II. However in cases of strong anisotropy, as is present in 
test problem 4, VICCG has some definite difficulty. In fact the 
presence of an iteration parameter in the algorithm is hardly 
warranted. 

In conclusion, it would appear that for relatively small 
domains (on the order of a few thousand unknowns) POLCG is a very 
good algorithm as applied to the solution of two and three dim- 
ensional groundwater flow equations. For larger problems MICCG is 
perhaps a better choice when circumstances are such that frequent 
updates of the matrix equation are unnecessary. As the number of 
unknowns continues to increase, VICCG is an attractive alterna- 
tive to MICCG because of its vectorizability . However, strongly 
anisotropic conditions seem to severely increase VICCG 's iter- 
ation count and for such problems MICCG may give superior per- 
formance . 
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Table 1 

Problem 1 (4913 nodes) 





Set up Iteration 


Number of 


Iteration 


Megaflop 


Algorithm 


time 


parameter 


iterations 


time 




rate 


SIP 


.157 


1.0 


2 


.007 




20 


ICCG 


.038 


n/a 


19 


.107 




28 


POLCG 


.029 


n/a 


13 


. 039 




92 


VICCG 


.099( . 


099) .9(0.0) 


19(21) 


.223(.252) 


21 


MICCG 


.091 


1.0 


2 


.001 




28 


DSCG 


. 026 


n/a 


39 


.045 




94 


PICCG 


. 039 


n/a 


21 


.050 




89 


Problem 2 


(4913 


nodes) 












Set up 


1 Iteration 


Number of 


Iteration 


Mega flop 


Algorithm 


time 


parameter 


iterations 


time 




rate 


SIP 


. 155 


.9994 


31 


.206 




20 


ICCG 


. 038 


n/a 


44 


.278 




28 


POLCG 


.028 


n/a 


55 


. 183 




92 


VICCG 


.097(. 


097) .98(0.0) 


30(44) 


.367(.556) 


21 


MICCG 


.089 


.975 


29 


. 184 




28 


DSCG 


. 026 


n/a 


178 


.211 




94 


PICCG 


.036 


n/a 


88 


.224 




89 


Problem 3 


(4913 


nodes) 












Set up 


Iteration 


Number of 


Iteration 


Megaflop 


Algorithm 


time 


parameter 


iterations 


time 




rate 


SIP 


. 147 


.9974 


524 


3.585 




20 


ICCG 


.038 


n/a 


45 


. 278 




28 


POLCG 


.029 


n/a 


39 


. 128 




92 


VICCG 


.090(. 


090) .92(0.0) 


40(48) 


.500( .609) 


21 


MICCG 


.082 


.95 


31 


. 182 




28 


DSCG 


. 026 


n/a 


124 


. 146 




94 


PICCG 


. 037 


n/a 


52 


. 131 




89 


Problem 4 


(1922 


nodes) 












Set up 


Iteration 


Number of 


Iteration 


Megaflop 


Algorithm 


time 


parameter 


iterations 


time 




rate 


SIP 


. 042 


.9994 


127 


.315 




18 


ICCG 


. 017 


n/a 


72 


. 166 




26 


POLCG 


. 014 


n/a 


103 


. 128 




85 


VICCG 


. 043 ( . 


043) .4(0.0) 


114(116) 


.271(.275) 


26 


MICCG 


.043 


.9953 


43 


. 101 




26 


DSCG 


. 013 


n/a 


336 


.150 




90 


PICCG 


.017 


n/a 


177 


.167 




83 



22 



Table 1 (continued) 
Problem 5 (5202 nodes) 







Set up 


Iteration 


Number of 


Iteration 


Megaflop 


Algorithm 


time 


parameter 


iterations 


time 




rate 


SIP 




. 166 


.9998 


189 


1.241 




19 


ICCG 




.046 


n/a 


58 


.373 




25 


POLCG 




. 038 


n/a 


52 


. 160 




93 


VICCG 




. 102 ( . 


102) .9(0.0) 


47(59) 


.224(.282) 


47 


MICCG 




.105 


.95 


35 


.212 




27 


DSCG 




.035 


n/a 


162 


. 186 




94 


PICCG 




.046 


n/a 


67 


. 162 




87 


Table 2 
















Problem 


2 


(35937 


nodes) 














Set up 


Iteration 


Number of 


Iteration 


Megaflop 


Algorithm 


time 


parameter 


iterations 


time 




rate 


SIP 




1.034 


.9994 


169 


8.562 




19 


ICCG 




.207 


n/a 


84 


3.995 




27 


POLCG 




. 150 


n/a 


139 


3.768 




84 


VICCG 




.570(. 


570) .996(0.0) 


47 (84) 


2.792(5.082) 


33 


MICCG 




. 566 


.994 


42 


1.967 




27 


DSCG 




. 136 


n/a 


457 


4 .306 




87 


PICCG 




.207 


n/a 


290 


6.254 




78 


Problem 


4 


(7442 


nodes) 














Set up 


Iteration 


Number of 


Iteration 


Mega flop 


Algorithm 


time 


parameter 


iterations 


time 




rate 


SIP 




.251 


.994 


223 


2.120 




19 


ICCG 




.745 


n/a 


145 


1.298 




26 


POLCG 




.065 


n/a 


193 


.855 




93 


VICCG 




. 154 ( . 


154) .4(0.0) 


221(232) 


1.444 (1.510) 


50 


MICCG 




. 174 


.9988 


63 


.589 




26 


DSCG 




. 058 


n/a 


640 


1.056 




94 


PICCG 




.073 


n/a 


328 


1.132 




88 
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